Introduction - clustering analysis of beer review dataset¶

This notebook performs an in-depth clustering analysis on a dataset containing various beer characteristics, including sensory ratings, chemical properties, and user reviews. The goal is to identify meaningful groupings of beers based on different subsets of features.

Key Steps Covered:¶

  1. Optimal Parameter Selection

    • For KMeans, the Elbow Method and Silhouette Score are used to determine the best number of clusters.
    • For Gaussian Mixture Models (GMM), the Bayesian Information Criterion (BIC) is used.
    • For DBSCAN, the k-distance graph helps estimate an appropriate epsilon.
  2. Model Comparison

    • The following clustering models are tested:
      KMeans, GaussianMixture, DBSCAN, HDBSCAN, MeanShift, AgglomerativeClustering.
    • Each model is evaluated using:
      • Silhouette Score (cluster separation)
      • Calinski-Harabasz Score (cluster compactness)
      • Davies-Bouldin Score (cluster similarity)
  3. Visualization

    • PCA is used to project high-dimensional data into 2D space for visual inspection.
    • Radar plots visualize average feature values across clusters for:
      • Full feature set
      • Feature subsets: Sensory, Profile, Chemical, Reviews
  4. Export

    • Radar plots are saved to projects/proj_3_team_5/plots/ for reporting and further analysis.

Importing and data load¶

In [1]:
import os
import sys
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from dotenv import load_dotenv
from sklearn.decomposition import PCA
from sklearn.neighbors import NearestNeighbors
from sklearn.cluster import KMeans, AgglomerativeClustering, DBSCAN, MeanShift, HDBSCAN
from sklearn.mixture import GaussianMixture
from sklearn.metrics import silhouette_score, calinski_harabasz_score, davies_bouldin_score
from kneed import KneeLocator

np.random.seed(42)

while any(marker in os.getcwd() for marker in ('exercises', 'notebooks', 'students', 'research', 'projects')):
    os.chdir("..") 
sys.path.append('.')
In [2]:
load_dotenv('projects/proj_3_team_5/.env')
df_path = os.getenv('PREPROCESSED_DATA_DIR')
df_cleaned_path = os.getenv('CLEANED_DATA_DIR')

df = pd.read_csv(df_path)
df_raw = pd.read_csv(df_cleaned_path)

Optimal parameter selection¶

In [3]:
# For KMeans, we use the Elbow method (inertia) and the Silhouette Score to select the optimal number of clusters.
# - The Elbow method helps identify the point where adding more clusters does not significantly reduce the within-cluster sum of squares (inertia), indicating a suitable number of clusters.
# - The Silhouette Score measures how similar an object is to its own cluster compared to other clusters, providing a quantitative metric for cluster quality.
# For GaussianMixture, we use the Bayesian Information Criterion (BIC) to select the optimal number of components.
# - BIC penalizes model complexity while rewarding goodness of fit, making it suitable for model selection in probabilistic clustering like GMM.
# - Lower BIC values indicate a better model, balancing fit and complexity.
inertia = []
silhouette = []
bic = []
n_range = range(2, 11)

for n in n_range:
    kmeans = KMeans(n_clusters=n, random_state=42)
    labels = kmeans.fit_predict(df)
    inertia.append(kmeans.inertia_)
    silhouette.append(silhouette_score(df, labels))
    
    gmm = GaussianMixture(n_components=n, covariance_type='full', random_state=42)
    gmm_labels = gmm.fit_predict(df)
    bic.append(gmm.bic(df))

# Plot KMeans elbow (inertia) and silhouette
plt.figure(figsize=(12, 5))
plt.subplot(1, 2, 1)
plt.plot(n_range, inertia, marker='o')
plt.title('KMeans Elbow Method (Inertia)')
plt.xlabel('Number of clusters')
plt.ylabel('Inertia')

plt.subplot(1, 2, 2)
plt.plot(n_range, silhouette, marker='o')
plt.title('KMeans Silhouette Score')
plt.xlabel('Number of clusters')
plt.ylabel('Silhouette Score')
plt.tight_layout()
plt.show()

# Plot GMM BIC
plt.figure(figsize=(6, 4))
plt.plot(n_range, bic, marker='o')
plt.title('GaussianMixture BIC')
plt.xlabel('Number of components')
plt.ylabel('BIC')
plt.tight_layout()
plt.show()


# For DBSCAN, we use the k-distance graph to estimate the optimal value of eps (the neighborhood radius).
# - The k-distance plot helps visualize the distance to the k-th nearest neighbor for each point, sorted in ascending order.
# - The "elbow" in this plot suggests a threshold where points start to become outliers, which is a good candidate for eps.

k = 7
neigh = NearestNeighbors(n_neighbors=k)
nbrs = neigh.fit(df)
distances, indices = nbrs.kneighbors(df)
k_distances = np.sort(distances[:, k-1])

plt.figure(figsize=(6, 4))
plt.plot(k_distances)
plt.title('DBSCAN k-distance Graph')
plt.xlabel('Points sorted by distance')
plt.ylabel(f'{k}th Nearest Neighbor Distance')
plt.tight_layout()
plt.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

Based on the KMeans parameter tuning:

  • The Elbow Method shows a visible bend around $k = 8$, suggesting diminishing returns in inertia reduction beyond this point.
  • However, the Silhouette Score is highest at $k = 2$ (approximately $0.12$), and drops sharply for higher values of $k$, even becoming negative, which indicates poor cluster separation.

Therefore, considering both methods, the optimal number of clusters is: $k=2$

In [4]:
optimal_kmeans = 2
optimal_gmm = 2
optimal_eps = 1250

For Agglomerative Clustering, we use the same number of clusters as determined optimal for KMeans, since both are hierarchical/partitioning methods and can be compared directly.

For MeanShift and HDBSCAN, we use their default parameter estimation, as these algorithms are designed to infer the number of clusters or density structure from the data.

PCA visualization and evaluation of clustering algorithms¶

In [5]:
models = {
    'KMeans': KMeans(n_clusters=optimal_kmeans, random_state=42),
    'GaussianMixture': GaussianMixture(n_components=optimal_gmm, covariance_type='full', random_state=42),
    'DBSCAN': DBSCAN(eps=optimal_eps, min_samples=k),
    'HDBSCAN': HDBSCAN(),
    'MeanShift': MeanShift(),
    'Agglomerative': AgglomerativeClustering(n_clusters=optimal_kmeans, linkage='complete')
}

pca = PCA(n_components=2)
X_pca = pca.fit_transform(df)

metrics = {}

plt.figure(figsize=(16, 12))
for i, (name, model) in enumerate(models.items(), 1):
    try:
        labels = model.fit_predict(df)
    except Exception as e:
        labels = np.zeros(df.shape[0])
        print(f"Model {name} failed: {e}")

    plt.subplot(3, 2, i)
    plt.scatter(X_pca[:, 0], X_pca[:, 1], c=labels, cmap='tab10', s=10)
    plt.title(f'{name} Clustering')
    plt.xlabel('PCA 1')
    plt.ylabel('PCA 2')
    n_clusters = len(set(labels)) - (1 if -1 in labels else 0)
    if n_clusters < 2:
        sil = -1
        ch = -1
        db = np.inf
    else:
        sil = silhouette_score(df, labels)
        ch = calinski_harabasz_score(df, labels)
        db = davies_bouldin_score(df, labels)
    metrics[name] = {
        'silhouette': sil,
        'calinski_harabasz': ch,
        'davies_bouldin': db,
        'n_clusters': n_clusters
    }

plt.tight_layout()
plt.show()
No description has been provided for this image

Selection of best clustering model¶

In [6]:
sensory_cols = ['review_aroma', 'review_appearance', 'review_palate', 'review_taste', 'review_overall']
profile_cols = ['Alcohol', 'Bitter', 'Sweet', 'Sour', 'Salty', 'Fruits', 'Hoppy', 'Spices', 'Malty', 'Astringency', 'Body']
chemical_cols = ['ABV', 'Min IBU', 'Max IBU']
review_cols = ['number_of_reviews']


metrics_df = pd.DataFrame(metrics).T
display(metrics_df)

# Choose the best clustering based on silhouette score (higher is better)
valid_metrics = metrics_df[metrics_df['n_clusters'] > 1]
if not valid_metrics.empty:
    best_model_name = valid_metrics['silhouette'].idxmax()
    print(f"Best clustering model: {best_model_name}")
else:
    best_model_name = metrics_df['silhouette'].idxmax()
    print(f"Best clustering model (by default): {best_model_name}")

best_model = models[best_model_name]
best_labels = best_model.fit_predict(df)

# Radar plot for the best clustering
radar_cols = sensory_cols + profile_cols + chemical_cols + review_cols
radar_cols = [col for col in radar_cols if col in df.columns]

# Compute mean for each cluster
cluster_means = pd.DataFrame(df[radar_cols])
cluster_means['cluster'] = best_labels
cluster_means = cluster_means.groupby('cluster').mean()

# Prepare radar plot
categories = radar_cols
N = len(categories)
angles = np.linspace(0, 2 * np.pi, N, endpoint=False).tolist()
angles += angles[:1]  # close the loop

plt.figure(figsize=(8, 8))
for idx, (cluster, row) in enumerate(cluster_means.iterrows()):
    values = row.values.flatten().tolist()
    values += values[:1]  # close the loop
    plt.polar(angles, values, label=f'Cluster {cluster}', linewidth=2)

plt.xticks(angles[:-1], categories, color='grey', size=10)
plt.title(f'Radar Plot of Cluster Means for {best_model_name}', size=15, y=1.08)
plt.legend(loc='upper right', bbox_to_anchor=(1.3, 1.1))
plt.tight_layout()
plt.savefig(f"projects/proj_3_team_5/plots/radar_{best_model_name}_overall.png", dpi=300)
plt.show()

# Add cluster labels to the original dataframe
df_with_clusters = df_raw.copy()
df_with_clusters['cluster'] = best_labels

# Show five sample beers from each cluster only if number of clusters is smaller than 10
n_clusters = len(df_with_clusters['cluster'].unique())
if n_clusters < 10:
    print(f"\n=== Sample Beers from Each Cluster ({best_model_name}) ===")
    for cluster in sorted(df_with_clusters['cluster'].unique()):
        cluster_beers = df_with_clusters[df_with_clusters['cluster'] == cluster]
        print(f"\nCluster {cluster} ({len(cluster_beers)} beers):")
        
        # Sample 5 beers from this cluster
        sample_beers = cluster_beers.sample(n=min(5, len(cluster_beers)), random_state=42)
        display(sample_beers)
else:
    print(f"\nSkipping sample display - too many clusters ({n_clusters})")
silhouette calinski_harabasz davies_bouldin n_clusters
KMeans 0.115463 193.196781 3.617928 2.0
GaussianMixture 0.115463 193.196781 3.617928 2.0
DBSCAN -1.000000 -1.000000 inf 1.0
HDBSCAN -0.093229 30.772725 3.962694 2.0
MeanShift 0.079885 2.358320 0.640568 410.0
Agglomerative 0.060042 87.414205 5.002694 2.0
Best clustering model: KMeans
No description has been provided for this image
=== Sample Beers from Each Cluster (KMeans) ===

Cluster 0 (1093 beers):
Name Style Brewery Beer Name (Full) Description ABV Min IBU Max IBU Astringency Body ... Hoppy Spices Malty review_aroma review_appearance review_palate review_taste review_overall number_of_reviews cluster
1219 Dogtoberfest Lager - Märzen / Oktoberfest Flying Dog Brewery Flying Dog Brewery Dogtoberfest There is sauerkraut in my lederhosen. I repeat... 5.6 18 25 17 46 ... 69 10 115 3.479839 3.685484 3.527218 3.553427 3.639113 496 0
284 Mai-Ur-Bock Bock - Maibock Einbecker Brauhaus AG Einbecker Brauhaus AG Einbecker Mai-Ur-Bock “Ready for May?” In spring, the Einbecker brew... 6.5 20 38 14 33 ... 74 10 112 3.688525 3.872951 3.827869 3.858607 3.862705 244 0
151 Pitchfork Rebellious Bitter Bitter - English RCH Brewery RCH Brewery Pitchfork Rebellious Bitter The name comes from the Pitchfork rebellion of... 4.3 20 35 34 49 ... 112 12 64 3.423729 3.949153 3.627119 3.711864 3.889831 59 0
2026 Warlock Stout - American Imperial Southern Tier Brewing Company Southern Tier Brewing Company Warlock Imperial stout brewed with pumpkins Warlock is... 8.6 50 80 3 50 ... 9 52 76 3.625000 4.125000 3.875000 4.000000 4.000000 4 0
1810 Gaelic Ale Red Ale - American Amber / Red Highland Brewing Highland Brewing Highland Gaelic Ale A deep amber colored American ale, featuring a... 5.8 25 45 7 23 ... 74 4 57 3.665904 3.821510 3.778032 3.869565 3.964531 437 0

5 rows × 26 columns

Cluster 1 (1502 beers):
Name Style Brewery Beer Name (Full) Description ABV Min IBU Max IBU Astringency Body ... Hoppy Spices Malty review_aroma review_appearance review_palate review_taste review_overall number_of_reviews cluster
1893 Roggenbier Rye Beer - Roggenbier Real Ale Brewing Company Real Ale Brewing Company Roggenbier NaN 4.9 10 20 8 25 ... 15 24 48 3.600000 3.900000 3.766667 3.733333 3.866667 15 1
1075 Löwenbräu Original Lager - Helles Löwenbräu AG Löwenbräu AG Löwenbräu Original NaN 5.2 18 25 22 29 ... 60 7 56 3.097594 3.304813 3.391711 3.407754 3.616310 374 1
783 Deuchars IPA IPA - English The Caledonian Brewing Company The Caledonian Brewing Company Deuchars IPA 4.4% ABV in bottles and 3.8% in cask.\t 4.4 35 60 27 53 ... 90 18 59 3.782051 3.722222 3.735043 3.888889 4.021368 117 1
769 India Pale Ale IPA - English Meantime Brewing Company Limited Meantime Brewing Company Limited India Pale Ale NaN 7.5 35 60 16 42 ... 99 11 84 3.847756 4.097756 3.956731 4.016026 4.028846 312 1
823 Sünner Kölsch Kölsch Gebr. Sünner GmbH & Co. KG Gebr. Sünner GmbH & Co. KG Sünner Kölsch NaN 4.8 18 25 37 32 ... 71 4 69 3.607759 3.702586 3.745690 3.732759 4.051724 116 1

5 rows × 26 columns

Clustering analysis and visualization by feature groups¶

In [7]:
feature_groups = {
    'Sensory': sensory_cols,
    'Profile': profile_cols,
    'Chemical': chemical_cols,
    'Reviews': review_cols
}

for group_name, columns in feature_groups.items():
    metrics = {}
    print(f"\n=== Feature Group: {group_name} ===")
    print("Clustering results for each model:")
    X = df[columns].dropna()
    n_components = min(2, X.shape[0], X.shape[1])
    pca = PCA(n_components=n_components)
    X_pca = pca.fit_transform(X)

    plt.figure(figsize=(20, 12))
    plt.suptitle(f'Feature Group: {group_name}', fontsize=16)

    # Find optimal parameters for the models using the elbow method

    # KMeans: Find optimal number of clusters using elbow and silhouette score
    sse = []
    silhouette_scores = []
    k_range = range(2, min(11, X.shape[0]))
    for k in k_range:
        kmeans = KMeans(n_clusters=k, random_state=42)
        labels = kmeans.fit_predict(X)
        sse.append(kmeans.inertia_)
        # Only compute silhouette if more than 1 cluster
        if len(set(labels)) > 1:
            sil = silhouette_score(X, labels)
        else:
            sil = -1
        silhouette_scores.append(sil)
    # Find elbow point (simple heuristic: where the decrease sharply slows)
    if len(sse) > 2:
        elbow_k = k_range[np.argmin(np.diff(sse, 2)) + 1]
    else:
        elbow_k = k_range[0]
    # Find k with maximum silhouette score
    best_sil_k = k_range[np.argmax(silhouette_scores)]
    # Combine: pick k that is closest to elbow_k but also has high silhouette (within 90% of max)
    sil_threshold = 0.9 * max(silhouette_scores)
    candidate_ks = [k for k, sil in zip(k_range, silhouette_scores) if sil >= sil_threshold]
    if candidate_ks:
        optimal_kmeans = min(candidate_ks, key=lambda k: abs(k - elbow_k))
    else:
        optimal_kmeans = elbow_k

    # GaussianMixture: Use same optimal number of components as KMeans
    optimal_gmm = optimal_kmeans

    # DBSCAN: Find optimal eps using k-distance graph (elbow method)
    from sklearn.neighbors import NearestNeighbors
    neigh = NearestNeighbors(n_neighbors=2)
    nbrs = neigh.fit(X)
    distances, indices = nbrs.kneighbors(X)
    distances = np.sort(distances[:, 1])
    # Heuristic: take the point of maximum curvature as optimal eps

    kneedle = KneeLocator(range(len(distances)), distances, S=1.0, curve="convex", direction="increasing")
    optimal_eps = distances[kneedle.knee] if kneedle.knee is not None else np.percentile(distances, 90)
    k = 2  # min_samples

    # Agglomerative: Use same optimal number of clusters as KMeans
    optimal_agglom = optimal_kmeans

    models = {
        'KMeans': KMeans(n_clusters=optimal_kmeans, random_state=42),
        'GaussianMixture': GaussianMixture(n_components=optimal_gmm, covariance_type='full', random_state=42),
        'DBSCAN': DBSCAN(eps=optimal_eps, min_samples=k),
        'HDBSCAN': HDBSCAN(),
        'MeanShift': MeanShift(),
        'Agglomerative': AgglomerativeClustering(n_clusters=optimal_agglom, linkage='complete')
    }


    for i, (model_name, model) in enumerate(models.items(), 1):
        labels = model.fit_predict(X)

        plt.subplot(2, 3, i)
        if n_components == 1:
            plt.scatter(X_pca[:, 0], [0]*len(X_pca), c=labels, cmap='tab10', s=10)
            plt.xlabel('PCA 1')
            plt.ylabel('0 (no 2nd PCA component)')
        else:
            plt.scatter(X_pca[:, 0], X_pca[:, 1], c=labels, cmap='tab10', s=10)
            plt.xlabel('PCA 1')
            plt.ylabel('PCA 2')

        n_clusters = len(set(labels)) - (1 if -1 in labels else 0)
        if n_clusters < 2:
            sil = -1
            ch = -1
            db = np.inf
        else:
            sil = silhouette_score(X, labels)
            ch = calinski_harabasz_score(X, labels)
            db = davies_bouldin_score(X, labels)
    
        plt.title(f'{model_name}')
        metrics[model_name] = {
            'silhouette': sil,
            'calinski_harabasz': ch,
            'davies_bouldin': db,
            'n_clusters': n_clusters
        }

    plt.tight_layout()
    plt.show()

    print("Model metrics for this group:")
    metrics_df = pd.DataFrame(metrics).T
    display(metrics_df)

    # Plot radar plot for the best model in this group
    valid_metrics = metrics_df[metrics_df['n_clusters'] > 1]
    if not valid_metrics.empty:
        best_model_name = valid_metrics['silhouette'].idxmax()
    else:
        best_model_name = metrics_df['silhouette'].idxmax()
    print(f"Best clustering model for {group_name}: {best_model_name}")

    best_model = models[best_model_name]
    best_labels = best_model.fit_predict(X)

    # Compute mean for each cluster
    cluster_means = pd.DataFrame(X, columns=columns)
    cluster_means['cluster'] = best_labels
    cluster_means = cluster_means.groupby('cluster').mean()

    # Prepare radar plot
    categories = columns
    N = len(categories)
    angles = np.linspace(0, 2 * np.pi, N, endpoint=False).tolist()
    angles += angles[:1]  # close the loop

    plt.figure(figsize=(8, 8))
    for idx, (cluster, row) in enumerate(cluster_means.iterrows()):
        values = row.values.flatten().tolist()
        values += values[:1]  # close the loop
        plt.polar(angles, values, label=f'Cluster {cluster}', linewidth=2)

    plt.xticks(angles[:-1], categories, color='grey', size=10)
    plt.title(f'Radar Plot of Cluster Means for {best_model_name} ({group_name})', size=15, y=1.08)
    plt.legend(loc='upper right', bbox_to_anchor=(1.3, 1.1))
    plt.tight_layout()
    plt.savefig(f"projects/proj_3_team_5/plots/radar__{best_model_name}_{group_name.lower()}.png", dpi=300)
    plt.show()

    # Add cluster labels to the original dataframe
    df_with_clusters = df_raw.copy()
    df_with_clusters['cluster'] = best_labels

    # Show five sample beers from each cluster only if number of clusters is smaller than 10
    n_clusters = len(df_with_clusters['cluster'].unique())
    if n_clusters < 10:
        print(f"\n=== Sample Beers from Each Cluster ({best_model_name}) ===")
        for cluster in sorted(df_with_clusters['cluster'].unique()):
            cluster_beers = df_with_clusters[df_with_clusters['cluster'] == cluster]
            print(f"\nCluster {cluster} ({len(cluster_beers)} beers):")
            
            # Sample 5 beers from this cluster
            sample_beers = cluster_beers.sample(n=min(5, len(cluster_beers)), random_state=42)
            display(sample_beers)
    else:
        print(f"\nSkipping sample display - too many clusters ({n_clusters})")
=== Feature Group: Sensory ===
Clustering results for each model:
No description has been provided for this image
Model metrics for this group:
silhouette calinski_harabasz davies_bouldin n_clusters
KMeans 0.490976 3242.767989 0.769278 2.0
GaussianMixture 0.205715 71.971462 4.538508 2.0
DBSCAN 0.242044 3.420279 5.444213 3.0
HDBSCAN -0.116752 60.261845 1.972067 4.0
MeanShift 0.237655 397.893592 1.191438 9.0
Agglomerative 0.512676 2762.645432 0.702818 2.0
Best clustering model for Sensory: Agglomerative
No description has been provided for this image
=== Sample Beers from Each Cluster (Agglomerative) ===

Cluster 0 (2104 beers):
Name Style Brewery Beer Name (Full) Description ABV Min IBU Max IBU Astringency Body ... Hoppy Spices Malty review_aroma review_appearance review_palate review_taste review_overall number_of_reviews cluster
1223 Munsterfest Lager - Märzen / Oktoberfest Three Floyds Brewing Co. & Brewpub Three Floyds Brewing Co. & Brewpub Munsterfest NaN 6.0 18 25 18 46 ... 50 11 95 3.649733 3.679144 3.745989 3.748663 3.893048 187 0
395 George Brown Ale - American Hill Farmstead Brewery Hill Farmstead Brewery George George was our grandfather’s brother, and Hill... 6.0 25 45 14 68 ... 67 7 133 4.000000 4.083333 4.000000 3.958333 3.916667 12 0
2084 Best Extra Stout Stout - Foreign / Export Coopers Brewery Limited Coopers Brewery Limited Coopers Best Extra Stout Now here's a beer with punch. Coopers Best Ext... 6.3 30 70 5 65 ... 43 10 98 3.753482 3.905292 3.710306 3.892758 3.870474 359 0
327 Mountain Holidays In Vermont Bock - Traditional Rock Art Brewery Rock Art Brewery Mountain Holidays In Vermont ... NaN 5.8 20 30 12 74 ... 47 26 119 3.701299 3.837662 3.805195 3.746753 3.824675 77 0
209 St. Feuillien Blonde Blonde Ale - Belgian Brasserie St. Feuillien Brasserie St. Feuillien St. Feuillien Blonde NaN 7.5 15 30 29 38 ... 63 32 35 3.829787 3.904255 3.723404 3.904255 3.936170 47 0

5 rows × 26 columns

Cluster 1 (491 beers):
Name Style Brewery Beer Name (Full) Description ABV Min IBU Max IBU Astringency Body ... Hoppy Spices Malty review_aroma review_appearance review_palate review_taste review_overall number_of_reviews cluster
2279 Faithfull Ale Strong Ale - Belgian Pale Dogfish Head Brewery Dogfish Head Brewery Faithfull Ale Faithfull Ale is a celebration of Pearl Jam's ... 7.0 20 40 24 48 ... 43 19 47 3.289286 3.632143 3.460714 3.278571 3.300000 140 1
683 Ta Henket Herb and Spice Beer Dogfish Head Brewery Dogfish Head Brewery Ta Henket For this ambitious liquid time capsule, we use... 4.5 5 40 7 25 ... 55 26 60 3.250000 3.490385 3.471154 3.355769 3.365385 52 1
1900 Orkiszowe Rye Beer - Roggenbier Browar Kormoran Browar Kormoran Orkiszowe NaN 5.1 10 20 0 0 ... 0 0 0 3.000000 3.000000 3.000000 3.000000 3.000000 1 1
2438 Benediktiner Weissbier Dunkel Wheat Beer - Dunkelweizen Klosterbrauerei Ettal / Ettaler Klosterbetrieb... Klosterbrauerei Ettal / Ettaler Klosterbetrieb... NaN 5.4 10 15 5 26 ... 11 15 57 3.250000 3.750000 3.250000 3.500000 3.500000 2 1
1871 Station 33 Firehouse Red Red Ale - Irish North Country Brewing North Country Brewing Station 33 Firehouse Red NaN 5.5 20 30 18 47 ... 44 2 114 3.147059 3.705882 3.411765 3.441176 3.411765 17 1

5 rows × 26 columns

=== Feature Group: Profile ===
Clustering results for each model:
No description has been provided for this image
Model metrics for this group:
silhouette calinski_harabasz davies_bouldin n_clusters
KMeans 0.207494 504.609572 1.465888 6.0
GaussianMixture -0.012702 175.887219 2.307591 6.0
DBSCAN -0.056300 13.582412 1.617519 5.0
HDBSCAN -0.419559 11.526929 1.582308 21.0
MeanShift -1.000000 -1.000000 inf 1.0
Agglomerative 0.132923 346.715836 1.797764 6.0
Best clustering model for Profile: KMeans
No description has been provided for this image
=== Sample Beers from Each Cluster (KMeans) ===

Cluster 0 (551 beers):
Name Style Brewery Beer Name (Full) Description ABV Min IBU Max IBU Astringency Body ... Hoppy Spices Malty review_aroma review_appearance review_palate review_taste review_overall number_of_reviews cluster
2525 Namaste White Belgian-Style Witbier Wheat Beer - Witbier Dogfish Head Brewery Dogfish Head Brewery Namaste A witbier bursting with good karma. Made with ... 4.8 10 20 14 29 ... 32 29 32 3.990291 3.987055 3.896440 3.998382 4.085761 309 0
594 Malmgård Jouluolut Farmhouse Ale - Sahti Malmgårdin Panimo Malmgårdin Panimo Malmgård Jouluolut NaN 4.5 0 0 0 0 ... 0 0 0 3.000000 3.500000 3.500000 2.500000 3.500000 1 0
811 Smetoniška Gira Kvass Vofas-Engelman Vofas-Engelman Smetoniška Gira NaN 1.2 0 0 2 4 ... 1 1 16 3.166667 3.500000 3.666667 3.166667 3.500000 3 0
592 Finlandia Sahti Farmhouse Ale - Sahti Finlandia Sahti Ky Finlandia Sahti Ky Finlandia Sahti NaN 8.0 0 0 7 18 ... 17 6 19 4.500000 3.000000 3.000000 3.500000 3.500000 2 0
1119 Organic Beer Shinshu Sansan Lager - Japanese Rice Yo-Ho Brewing Company Yo-Ho Brewing Company Organic Beer Shinshu Sansan NaN 5.0 6 18 7 7 ... 15 1 18 3.333333 3.000000 3.000000 3.000000 3.500000 6 0

5 rows × 26 columns

Cluster 1 (294 beers):
Name Style Brewery Beer Name (Full) Description ABV Min IBU Max IBU Astringency Body ... Hoppy Spices Malty review_aroma review_appearance review_palate review_taste review_overall number_of_reviews cluster
559 Green's Endeavour Dubbel Dark Ale Dubbel Green's Gluten Free Beers Green's Gluten Free Beers Green's Endeavour NaN 7.0 15 30 13 42 ... 27 16 64 2.935185 3.583333 2.481481 2.509259 2.509259 54 1
208 Blond Blonde Ale - Belgian Brasserie de l'Abbaye du Val-Dieu Brasserie de l'Abbaye du Val-Dieu Val-Dieu Blond NaN 6.0 15 30 33 35 ... 61 33 56 3.771277 3.941489 3.840426 3.856383 3.984043 94 1
1355 Lou Pepe - Gueuze Lambic - Gueuze Brasserie Cantillon Brasserie Cantillon Cantillon Lou Pepe - Gueuze NaN 5.0 0 10 23 23 ... 6 2 5 4.335938 4.113281 4.230469 4.378906 4.406250 128 1
2555 Consecration Wild Ale Russian River Brewing Company Russian River Brewing Company Consecration Dark ale aged in Cabernet Sauvignon barrels wi... 10.0 5 30 31 28 ... 9 9 16 4.366114 4.091232 4.283768 4.473934 4.296801 844 1
2527 Blanche De Chambly Wheat Beer - Witbier Unibroue Unibroue Blanche De Chambly The Blanche de Chambly label features the icon... 5.0 10 20 17 33 ... 42 34 52 3.805526 3.837407 3.756111 3.840064 3.971307 941 1

5 rows × 26 columns

Cluster 2 (576 beers):
Name Style Brewery Beer Name (Full) Description ABV Min IBU Max IBU Astringency Body ... Hoppy Spices Malty review_aroma review_appearance review_palate review_taste review_overall number_of_reviews cluster
1298 Rusty Chain Lager - Vienna Flying Bison Brewing Company Flying Bison Brewing Company Rusty Chain The #1 best selling local craft beer in Buffal... 5.2 15 30 14 45 ... 48 7 80 3.250000 3.562500 3.625000 3.625000 3.625000 8 2
461 Pullman Nut Brown Brown Ale - English Flossmoor Station Restaurant & Brewery Flossmoor Station Restaurant & Brewery Pullman... A traditional english brown ale, very nutty ar... 6.0 15 25 8 96 ... 28 4 177 3.983553 3.917763 3.960526 4.078947 4.046053 152 2
1819 Captain Sig's Northwestern Ale Red Ale - American Amber / Red Rogue Ales Rogue Ales Captain Sig's Northwestern Ale Label of 22oz bottle:10 Ingredients: Pale 2-ro... 6.2 25 45 6 55 ... 107 9 112 3.759939 4.027523 3.808869 3.799694 3.844037 327 2
2145 Black Magic Stout Stout - Irish Dry Empire Brewing Company Empire Brewing Company Black Magic Stout A traditional dry Irish stout, carbonated with... 4.8 30 40 43 100 ... 44 12 114 3.625000 4.375000 3.541667 3.625000 3.750000 12 2
2034 Obsidian Stout Stout - American Deschutes Brewery Deschutes Brewery Obsidian Stout Deep, robust and richly rewarding, this is bee... 6.4 35 60 9 72 ... 67 6 132 4.077964 4.304768 4.121134 4.266753 4.250000 776 2

5 rows × 26 columns

Cluster 3 (564 beers):
Name Style Brewery Beer Name (Full) Description ABV Min IBU Max IBU Astringency Body ... Hoppy Spices Malty review_aroma review_appearance review_palate review_taste review_overall number_of_reviews cluster
1916 Rye IPA Rye Beer Black Market Brewing Co. Black Market Brewing Co. Rye IPA NaN 7.5 10 80 17 31 ... 89 18 75 3.650000 3.750000 3.650000 3.700000 3.700000 10 3
1847 Lavery Imperial Red Ale Red Ale - Imperial Lavery Brewing Company Lavery Brewing Company Lavery Imperial Red Ale BIG. HOPPY. RED. Irish beer gone incognito! Ou... 8.2 55 85 16 37 ... 76 2 44 3.666667 4.166667 3.833333 3.750000 3.833333 6 3
874 Molson Ice Lager - Adjunct Molson Coors Canada Molson Coors Canada Molson Ice NaN 5.6 8 18 21 18 ... 38 3 55 2.309816 2.812883 2.684049 2.625767 2.815951 163 3
1857 O'Hara's Irish Red Red Ale - Irish Carlow Brewing Company Carlow Brewing Company O'Hara's Irish Red NaN 4.3 20 30 15 40 ... 68 2 99 3.505952 3.830357 3.500000 3.511905 3.684524 168 3
1623 Evil Power Pilsner - Imperial Three Floyds Brewing Co. & Brewpub Three Floyds Brewing Co. & Brewpub Evil Power A fortified European-style Pilsner lagered to ... 7.2 30 65 37 50 ... 100 9 82 3.428571 3.803571 3.607143 3.508929 3.464286 56 3

5 rows × 26 columns

Cluster 4 (209 beers):
Name Style Brewery Beer Name (Full) Description ABV Min IBU Max IBU Astringency Body ... Hoppy Spices Malty review_aroma review_appearance review_palate review_taste review_overall number_of_reviews cluster
544 Benediction Dubbel Russian River Brewing Company Russian River Brewing Company Benediction Brown in color, Benediction has notes aromas a... 6.75 15 30 11 47 ... 32 27 58 3.960000 3.880000 4.120000 4.180000 4.280000 25 4
2455 Dancing Man Wheat Beer - Hefeweizen New Glarus Brewing Company New Glarus Brewing Company Dancing Man Wheat If you dream of wheat this brew will get your ... 7.20 10 15 13 46 ... 16 69 59 4.249315 4.236986 4.228767 4.335616 4.371233 365 4
1770 UFO Pumpkin Pumpkin Beer Harpoon Brewery Harpoon Brewery UFO Pumpkin Imagine a pumpkin vine wound its way in a fiel... 5.90 5 70 9 37 ... 27 75 59 3.703125 3.710938 3.562500 3.531250 3.570312 64 4
2566 Samuel Adams Old Fezziwig AleBoston Beer Compa... Winter Warmer Boston Beer Company (Samuel Adams) Boston Beer Company (Samuel Adams) Samuel Adam... Old Fezziwig, Rich & Sweet: Like the character... 5.90 35 50 6 35 ... 10 64 78 3.777236 3.906911 3.747561 3.843089 3.810163 1230 4
1518 Hell's Belle Pale Ale - Belgian Big Boss Brewing Big Boss Brewing Hell's Belle NaN 7.00 20 30 19 36 ... 44 40 46 3.522222 3.627778 3.605556 3.538889 3.650000 90 4

5 rows × 26 columns

Cluster 5 (401 beers):
Name Style Brewery Beer Name (Full) Description ABV Min IBU Max IBU Astringency Body ... Hoppy Spices Malty review_aroma review_appearance review_palate review_taste review_overall number_of_reviews cluster
1955 Railbender Ale Scottish Ale Erie Brewing Co. Erie Brewing Co. Railbender Ale Erie Brewing Company flagship beer features a ... 6.8 9 25 5 40 ... 22 6 103 3.422559 3.594276 3.602694 3.624579 3.664983 297 5
1944 Heavy Horse Scotch Ale Scotch Ale / Wee Heavy Big Sky Brewing Company Big Sky Brewing Company Heavy Horse Scotch Ale NaN 6.7 25 35 12 61 ... 37 14 119 3.675000 3.891667 3.741667 3.691667 3.716667 60 5
71 Holidale Barleywine - American Berkshire Brewing Company Inc. Berkshire Brewing Company Inc. Holidale NaN 9.5 60 100 15 72 ... 69 27 107 3.831325 3.987952 3.906627 3.990964 3.891566 166 5
1467 Thanksgiving Ale Old Ale Mayflower Brewing Company Mayflower Brewing Company Mayflower Thanksgivi... The first and only perennial offering in our C... 6.7 30 65 18 69 ... 51 53 151 3.826389 3.798611 3.881944 3.972222 3.972222 72 5
303 Southampton May Bock Bock - Maibock Southampton Publick House Southampton Publick House Southampton May Bock NaN 6.5 20 38 23 57 ... 43 11 116 3.835227 3.880682 3.977273 4.113636 4.221591 88 5

5 rows × 26 columns

=== Feature Group: Chemical ===
Clustering results for each model:
No description has been provided for this image
Model metrics for this group:
silhouette calinski_harabasz davies_bouldin n_clusters
KMeans 0.515126 2293.056586 0.852369 2.0
GaussianMixture 0.469307 1347.788702 1.145447 2.0
DBSCAN 0.248944 312.481734 1.674888 15.0
HDBSCAN 0.429212 31.598181 1.986562 222.0
MeanShift 0.436874 1198.007788 0.772829 3.0
Agglomerative 0.442224 2029.390041 0.983633 2.0
Best clustering model for Chemical: KMeans
No description has been provided for this image
=== Sample Beers from Each Cluster (KMeans) ===

Cluster 0 (2072 beers):
Name Style Brewery Beer Name (Full) Description ABV Min IBU Max IBU Astringency Body ... Hoppy Spices Malty review_aroma review_appearance review_palate review_taste review_overall number_of_reviews cluster
1558 English Ale Pale Ale - English St. Peter's Brewery Co Ltd St. Peter's Brewery Co Ltd St. Peter's English... NaN 4.5 20 40 38 50 ... 98 3 78 3.445455 3.543182 3.577273 3.615909 3.747727 220 0
906 Cypress Honey Lager Lager - American Amber / Red Granville Island Brewery Granville Island Brewery Cypress Honey Lager Brewed in small batches, our Cypress Honey Lag... 4.7 18 30 11 34 ... 26 2 132 3.000000 3.250000 3.339286 3.125000 3.446429 28 0
851 Labatt Blue Lager - Adjunct Labatt Brewing Company Ltd. Labatt Brewing Company Ltd. Labatt Blue Labatt Blue is the best-selling Canadian beer ... 5.0 8 18 16 22 ... 20 2 32 2.406096 2.690280 2.696870 2.672982 3.060956 607 0
1692 Bully! Porter Porter - English Boulevard Brewing Co. Boulevard Brewing Co. Bully! Porter The intense flavors of dark-roasted malt in Bo... 6.0 20 30 10 79 ... 36 14 99 3.804393 4.125523 3.791841 3.991632 4.001046 478 0
482 Red Bird Ale California Common / Steam Beer Portsmouth Brewing Co. / Mault's Brewpub Portsmouth Brewing Co. / Mault's Brewpub Ports... An American Home Run! Named for the 1939 Ports... 4.8 35 45 8 22 ... 26 3 30 3.333333 3.500000 3.250000 3.083333 3.250000 6 0

5 rows × 26 columns

Cluster 1 (523 beers):
Name Style Brewery Beer Name (Full) Description ABV Min IBU Max IBU Astringency Body ... Hoppy Spices Malty review_aroma review_appearance review_palate review_taste review_overall number_of_reviews cluster
2593 Winter Shredder Winter Warmer Cisco Brewers Inc. Cisco Brewers Inc. Winter Shredder NaN 8.8 35 50 15 37 ... 45 67 74 4.125000 3.875000 3.875000 3.750000 4.000000 4 1
2212 Life and Limb Strong Ale - American Sierra Nevada Brewing Co. Sierra Nevada Brewing Co. Life & Limb Brewed in collaboration with Dogfish Head Craf... 10.2 40 100 6 52 ... 36 22 112 3.817507 4.114243 3.947329 3.996291 3.887240 674 1
42 Olde GnarlyWine Barleywine - American Lagunitas Brewing Company Lagunitas Brewing Company Olde GnarlyWine 2011: 10.6% ABV, 69 IBU\t 10.9 60 100 14 49 ... 68 17 89 4.052373 4.143208 4.082651 4.127660 4.016367 611 1
1450 Fourth Dementia - Bourbon Barrel-Aged Old Ale Kuhnhenn Brewing Company Kuhnhenn Brewing Company Kuhnhenn Bourbon Barr... This is our 4th Dementia Olde Ale that has bee... 13.5 30 65 13 61 ... 18 32 124 4.555556 3.941919 4.340909 4.638889 4.474747 198 1
698 Furious IPA - American Surly Brewing Company Surly Brewing Company Furious A tempest on the tongue, or a moment of pure h... 6.7 50 70 11 24 ... 96 3 61 4.374592 4.271650 4.200572 4.379902 4.336601 1224 1

5 rows × 26 columns

=== Feature Group: Reviews ===
Clustering results for each model:
No description has been provided for this image
Model metrics for this group:
silhouette calinski_harabasz davies_bouldin n_clusters
KMeans 0.683902 8787.141046 0.507592 3.0
GaussianMixture 0.503322 4169.878920 0.578554 3.0
DBSCAN 0.633867 354.255109 1.799954 20.0
HDBSCAN 0.754922 212.036369 1.638289 210.0
MeanShift 0.708887 3923.101971 0.412261 3.0
Agglomerative 0.709907 4861.545776 0.412262 3.0
Best clustering model for Reviews: HDBSCAN
/var/folders/w5/bqtvk3ss7gbf6xw9z7wqv6s80000gn/T/ipykernel_31732/1022216729.py:148: UserWarning: Tight layout not applied. The bottom and top margins cannot be made large enough to accommodate all Axes decorations.
  plt.tight_layout()
No description has been provided for this image
Skipping sample display - too many clusters (211)
In [ ]: